Safe Learning for Near-Optimal Scheduling
نویسندگان
چکیده
In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have $$10^{20}$$ states beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees learning model. Additionally, extend Monte-Carlo tree search advice, computed using safety games or obtained earliest-deadline-first scheduler, safely explore learned model online. Finally, implemented compared our empirically against shielded deep Q-learning on large systems.
منابع مشابه
Near-optimal Regret Bounds for Reinforcement Learning Near-optimal Regret Bounds for Reinforcement Learning
This technical report is an extended version of [1]. For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s there is a policy which moves from s to s i...
متن کاملNear-Optimal Course Scheduling at the Technion
The focus of this article is the automation of course, classroom, and exam scheduling for the faculty of Industrial Engineering (IE) at the Technion in Haifa, Israel. The system, called the Technion Industrial Engineering Scheduler (TieSched), has been operational since 2012. It is based on a distributed collection of constraints and multiple engines running in parallel, including SAT, pseudo-B...
متن کاملNear-optimal Regret Bounds for Reinforcement Learning
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s′ there is a policy which moves from s to s′ in at most D steps (on average). We present a reinfo...
متن کاملNear optimal algorithms for scheduling independent chains in BSP
The aim of this work is to show that scheduling a set of independent chains on a parallel machine under the BSP model is a difficult optimization problem which can be easily approximated in practice. BSP is a machine independent computational model which is becoming more and more popular [7]. Finding the optimal solution when the number of processors is fixed is shown to be hard. Efficient heur...
متن کاملNear-Optimal Scheduling for LTL with Future Discounting
We study synthesis of optimal schedulers for the linear temporal logic (LTL) with future discounting. The logic, introduced by Almagor, Boker and Kupferman, is a quantitative variant of LTL in which an event in the far future has only discounted contribution to a truth value (that is a real number in the unit interval [0, 1]). The precise problem we study—it naturally arises e.g. in search for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-85172-9_13